Bookmark
Agentic Misalignment: How LLMs could be insider threats
https://www.anthropic.com/research/agentic-misalignment, posted 9 Aug by peter in ai science toread
In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.